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Abstract 

A uniform random intersection graph G{n, m, k) is a random graph 
constructed as follows. Label each of n nodes by a randomly chosen 
set of k distinct colours taken from some finite set of possible colours 
of size m. Nodes are joined by an edge if and only if some colour 
appears in both their labels. These graphs arise in the study of the 
security of wireless sensor networks, in particular when modelling the 
network graph of the well known key predistribution technique due to 
Eschenauer and Gligor. 

The paper determines the threshold for connectivity of the graph 
G(n, m, k) when n — > cx) in many situations. For example, when k is 
a function of n such that k > 2 and m = [n"J for some fixed positive 
real number a then G(n, m, k) is almost surely connected when 

liminf fc^n/mlogn > 1, 

and G{n, m, k) is almost surely disconnected when 

lim sup fc^rt/m log n < 1. 

Keywords: random intersection graph; key predistribution; wireless sen- 
sor network. 
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1 Introduction 



1.1 Notation and motivation 

The uniform random intersection graph G{n, m, k) is a random graph de- 
fined as follows. Let y be a set of n nodes, and let M be a set of m colours. 
To each node v £ V we assign a subset Fy O M of k distinct colours, chosen 
uniformly and independently at random from the /c-subsets of M. We join 
distinct nodes u,v € V hy an edge if and only if FuCi ^ 0. This paper 
studies the connectivity threshold of uniform random intersection graphs. 

The study of G{n, m, k) is motivated by an application to wireless sen- 
sor networks (WSNs). A WSN is a collection of (usually very small) sensor 
devices that are able to communicate wirelessly. Sample applications where 
WSNs might be used include disaster recovery, wildlife monitoring and mil- 
itary situations. Sensors' computational abilities are assumed to be severely 
limited by their size and battery life. The sensor network is designed to be 
deployed in an unstructured environment (sensors might be scattered from 
an aeroplane, for example). On deployment the individual sensors need to 
form a secure wireless network that is connected, but should also be robust 
against the compromise of individual sensor's secret data due to malfunc- 
tion or capture. The classic WSN technique to accomplish this is due to 
Eschenauer and Gligor [6]: each sensor is preloaded with k distinct encryp- 
tion keys, randomly taken from a pool of m possible keys. Two sensors can 
form a secure link if they are within wireless communication range and they 
share one or more encryption keys. The uniform random intersection graph 
models this situation in the case when all sensors are within communication 
range. (In the terminology of the subject, a uniform random intersection 
graph is a network graph for Eschenauer-Gligor key predistribution). 

The application requires the network to be connected with high proba- 
bility. Looking at other results in random graph theory, we would expect 
the parameters n, m and k to exhibit a threshold behaviour with respect 
to connectivity: for most parameters we would expect that the probability 
that G{m, n, k) is connected is either very high or very low. It is impor- 
tant to understand the connectivity threshold (the area of the parameter 
space bordering the regions of low and high connectivity probability) as 
precisely as possible, as this threshold effects the choice of parameters in the 
Eschenauer-Gligor scheme. Eschenauer and Gligor, and most of the subse- 
quent WSN literature, model the uniform random intersection graph as a 
classical Erdos-Renyi random graph G(n,p), a graph with n vertices whose 
edges are chosen randomly and independently with a fixed probability p. 
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They then use the asymptotic behaviour of Erdos-Renyi random graphs to 
find good parameters for the scheme. For distinct nodes u,v €z G{n,m,k), 
the probabihty that uv is an edge is p, where 



(To see why this approximation holds, note that u is assigned k colours and 
the probability that each colour is assigned to v is k/m.) So the WSN lit- 
erature models G{n,m, k) by the Erdos-Renyi random graph G{n,p) where 
p = k'^/m. It is well known that the connectivity threshold of G{n,p) 
occurs when p ~ (log n)/n. So modelling G(n, m, k) as an Erdos-Renyi ran- 
dom graph predicts that the connectivity threshold lies at the point when 
k'^/m ~ (log n)/n. Though simulations support this threshold, modelling 
G(n,m,k) in this way is unsatisfactory since the behaviour of G{n,p) and 
G{n,m,k) is sometimes radically different. For example, we expect many 
more triangles in G{n,m,k) than in G{n,p), especially when k is small. 
(When u,v,w G G{n, m, 2) are distinct vertices such that uv and vw are 
edges, then the probability that uw is an edge is more than 1/2, since this 
is the probability that v shares the same colour with both u and w.) 

1.2 Our results 

Let k and m be functions of n. Our proof techniques and results depend 
heavily on whether m > n or not, so we discuss these two cases separately. 

Suppose that m > n. We will show (Theorem [5]) that G{n, m, k) is 
asymptotically almost surely connected when liminf„^oo fc^n/(mlogn) > 1. 
(By an event occurring asymptotically almost surely, we mean that the 
probability of the event tends to 1 as n oo.) This threshold is tight: 
we will show that G(n, m, k) is asymptotically almost surely disconnected 
when limsup^^oQ A;^n/(mlogn) < 1. Di Pietro, Mancini, Mei, Panconesi 
and Radhakrishnan [U [5] give a weaker form of Theorem [D that G(n, m, k) 
is almost surely connected when liminf^^oo k'^n/ {m\ogn) > 8. (The jour- 
nal version of their paper [5] only claims that G{n, m, k) is almost surely 
connected when liminin^oo k'^n/{mlogn) > 17.) Di Pietro et al also ob- 
serve that G{n,m,k) is almost surely disconnected when /c^n/(m log n) 
as n ^ oo. Part of our proof of Theorem [5] is inspired by their techniques. 
We comment that there is a gap we are unable to bridge in their proof, 
which means that we take a subtly different approach to theirs: we discuss 
this at the end of Section [H 



p = 1 — 
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We now turn to the case when m < n. We show (see Section [3|) that 
whenever {An/m) — logn ^ oo as n ^ oo then G{n,m, k) is asymptoticahy 
almost surely connected. We will show (see Theorem [3] below) that this 
threshold is tight in the case when k = 2. This settles the case, for example, 
when m = o{n/ logn). We note that this case is also a consequence of recent 
work of Godehardt, Jaworski and Rybarczyk [9], who show that when k is 
fixed, G(n, m, k) is asymptotically almost surely connected whenever n is a 
function of m such that {kn/m) — logm — > oo as m — > oo. We believe that 
their result is not tight: see Section [5] for a discussion. 

This leaves a narrow range of parameters not covered by our results, 
when m grows just a little more slowly than n. Though this range is too small 
to be of significance in applications, there are some interesting mathematical 
questions here. We comment on this in the final section of the paper. By 
constraining m to be of the form m = [n"J where a is a fixed positive real 
number, we avoid this gap and obtain the following easy to state summary 
of our results: 

Theorem 1. Let q G M 6e positive. Let k and m be functions of n such 
that k > 2 and m = [n"J . 

(i) Suppose that 

liminf — > 1. (1) 

n->oo m log n 

Then asymptotically almost surely G{n, m, k) is connected. 

(ii) Suppose that 

k'^n 

hmsup — < 1. 

n^oo m log n 

Then asymptotically almost surely G{n, m, k) is not connected. 



1.3 Related results 

Other properties of G(n, m. A;) besides connectivity have been studied. For 
example, Godehardt and Jaworski [8] have results on the distribution of the 
number of isolated vertices of G{n,m,k) when nfc^/mlogn tends to a con- 
stant; Bloznelis, Jaworski and Rybarczyk determine the emergence of the 
giant component when n(logn)^ = o(m); Jaworski, Karohski and Stark [lOj 
study the vertex degree distribution of random intersection graphs. 

A related, non-uniform, definition of a random intersection graph has 
been studied as part of the modelling of clustering in real-world networks 
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(see [U [3 \TT\ [12] , for example) . We define the (non-uniform) random in- 
tersection graph G{n,m,p) exactly as in the definition of G{n,m,k) above, 
except we choose the subsets Fy differently: each is constructed by the 
rule that each colour c G M lies in F^ independently with probability p. 
(Thus the set Fy is likely to vary in size as v varies, and will have expected 
size pm.) In her thesis, Singer-Cohen [11] establishes connectivity thresholds 
for G{n,m,p). To compare her results with Theorem [T| consider the case 
when p = k / m, so the expected size of a set Fy is k. When a > 1, Singer- 
Cohen shows that the connectivity threshold lies at p = \J {logn)/nm, which 
agrees with the threshold of Theorem [1] (though Singer-Cohen's threshold 
is sharper). In fact, when m is large compared to n this agreement is a 
consequence of standard concentration results. When a < 1, Singer-Cohen 
shows that the connectivity threshold lies at p = log n/m, which is much 
higher than the threshold of Theorem [TJ The intuition here is that when m 
is small there are some nodes v in G{n, m,p) with Fy much smaller than pm 
(indeed, Fy may even be empty). It is these nodes that provide the dom- 
inant obstacle to connectivity in G{n,m,p) when a < 1. This also shows 
that G{n,m,p) may behave differently to G{n,m,k). 

1.4 The structure of the paper 

The remainder of the paper is structured as follows. Section [2] establishes 
the threshold for the existence of isolated vertices in G{n,m,k), using the 
first and second moment methods; this result is sufficient to establish The- 
orem [1] (ii). Section [3] specialises to the case when k = 2, and proves Theo- 
rem [T](i) when a < 1. Section U] proves Theorem [T](i) when a > 1. Finally, 
Section [5] discusses prospects of establishing tighter connectivity thresholds 
for G{n, m, k). 

2 Isolated vertices 

We aim to prove the following theorem on the probability of an isolated 
vertex appearing in G{n,m,k). 

Theorem 2. Let k and m be functions of n. 

(i) Suppose that 

= logn +u; 2) 

m 

where —> oo as n ^ oo. Then almost surely G{n,m,k) does not 
contain an isolated vertex. 
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(ii) Suppose that 




nj — u; 



(3) 



m 



where uj ^ oo as n ^ oo. Then almost surely G{n,m, k) contains an 
isolated vertex. 

The proof of this theorem is an apphcation of standard techniques from 
random graph theory: we include the proof for completeness. We remark 
that Godehardt and Jaworski have much stronger results on the distribution 
of the number of isolated vertices on the threshold: in particular, they deter- 
mine the distribution when (k'^n/m) — logn c for some constant c; see [8] 
for a statement of their results. Note that (in contrast to many situations 
in random graph theory) it is not at all clear that Theorem [2] immediately 
follows from their result: problems occur with a reduction as, for example, 
k has to be integer and if one changes /c by 1 then k'^n/m may vary by a 
factor greater than logn. 

Proof. For v , define the random variable by 



Define X = X^^gy X.^. So E{X) is the expected number of isolated vertices 
in G{n,m,k). Note that, by linearity of expectation, E{X) = nE{Xu), 
where u € V is any fixed vertex. A vertex is isolated if and only if F^CiFu = 
for all V G V \ {u}. Hence 



Suppose that ([2]) holds. We show that then E{X) and the result 
follows by Markov's inequality. We have that 




1 if v is isolated 
otherwise. 





exp( 



w + o{w)) by ©. 



So E{X) 0, as required 
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We now aim to prove Part (ii) of the theorem using the second moment 
method. Note first that ([3]) imphes that k = o{m), and thus for sufficiently 
large n 

k r— 2k'^n , , 

^k{n-\) < ^ =o(l). 



m — k m^/n 
Since (1 — pY = exp(— px + o(l)) whenever p^/x = o(l) we have 

E(X)=n ni-^ >n(l-^) 
VtLq m — I J \ m — k J 

( k^n ( k^n 

= nexp + oil) = nexp h oil) 

\ m — k J \ m 

= exp(?i; + 0(1)) 

which tends to infinity as n — > cxd. The second moment method now implies 
the result we require, provided that we can show that Var(X) <C E{X)'^ . 
Now 

Var(X) = E{X^) - E{Xf > 0, 
and so it suffices to show that E{X^) = (1 + o{l))E{Xf . Note that 

E{X^) = E{X) + n{n - 

where ui,U2 are fixed vertices. Since E{X) 00, it therefore suffices to 
prove that 

n{n - l)E{Xu,X^,) 

^^^p ^ 1 as n ^ 00. (4) 

Note that Xy^-^^X^^ takes the value 1 exactly when ui and U2 are both isolated. 
For ui and U2 to both be isolated, -F^^ and should be disjoint (so there 
is no edge between ui and U2) and for all v £ V \ {ui,U2} we must have 
that Fy is disjoint from F^^ U (so there is no edge from v to either of ui 
or U2). Thus 

E{Xy,Xy,^ - ' ' 



(T) V (T) 

/ 2k'^n 

exp h o(l) 

V m 
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as before. Since we proved above that 



^2 

E{X) = nexp ( + o{r 



m 

we see that (j4]) holds, as required. □ 

3 The case when k = 2 or m = o{n/ logn) 

In this section we prove the foUowing theorem concerning the case when 
each vertex is assigned a set of colours of size two. 

Theorem 3. Let m be a function ofn. 

(i) Suppose that 

4n 

— = (logn) + to (5) 
m 

where w — > oo as n ^ oo. Then almost surely G{n, m, 2) is connected. 

(ii) Suppose that 

— = (log n) — LO 
m 

where u ^ oo as n ^ oo. Then almost surely G{n,m,2) is not con- 
nected. 

We remark that this theorem implies that G{n, m, k) is asymptotically 
almost surely connected whenever m = o(n/logn) (and, in particular. The- 
orem [3] implies Theorem [1] holds when a < 1). To see this, we first choose 
2 colours for each vertex from the m available colours uniformly at ran- 
dom to obtain an instance of G{n,m,2). As m = o(4n/logn) we have 
logn = o(4n/m) and thus by Theorem [3] the graph G(n, m, 2) is asymp- 
totically almost surely connected. If we now choose k — 2 more colours for 
each vertex from the remaining available colours uniformly at random then 
each vertex has been assigned k colours uniformly at random, and so we 
have obtained an instance of G{n, m, k). Moreover the newly chosen colours 
can only add edges to the graph and thus the instance of G(n, m, k) is more 
likely to be connected than the instance of G(n, m, 2). 

To prove Theorem [3] we first prove the following lemma which says that 
we only have to consider values of m that are not too small compared with n. 

Lemma 4. It is sufficient to prove Part (i) of Theorem [3] in the case when 

< 1. (6) 
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Proof. Suppose that we have proved Part (i) of Theorem [3] under the ad- 
ditional assumption ([6]). Suppose that Q is not satisfied. To prove the 
lemma, it is sufficient to show that we may replace m by a larger function 
m' of n such that ^ — log n — > oo and — < 4, with the property that 

m' ^ m' log n — ' f f j 

G{n,m' ,2) is less likely to be connected than G{n,m,2). 

Define m' by setting m! = m whenever ([6]) is satisfied; otherwise let £ be 
the unique positive integer such that 

An 

2 < — < 4 

log n 

and define m' = 2^m. Note that 

An , 

— - — log n — > oo 
m' 

as n — > oo since whenever m ^ m' we have that 
An An 

■ log n > 2 log n, 



2^m log 



n 



by our choice of i. 

It remains to show that G(n, m',2) is less likely to be connected than 
G{n,m,2). 

Let M' be a set of m' colours. Partition M' into m classes, each of 
size 2^. Identify the set M of m colours with the classes of this partition. 
We generate an instance of G(n, m, 2) as follows. Firstly, we generate an 
instance of G{n,m' ,2), so each node v is assigned a set C M' of size 2. 
Secondly, by replacing each colour by the class containing it we assign a 
set of at most 2 colours from M to each vertex. Thirdly, for those vertices 
assigned only one colour from M, we assign an additional colour uniformly 
and independently at random. Note that this process does indeed generate 
an instance of G(n, m, 2), since the vertices assigned one colour from M 
in the second step are coloured uniformly and independently. To see that 
G(n, m, 2) is more likely to be connected than G{n, m' ,2), note that each of 
the last two steps adds edges to the graph (where the adjacency relation of 
the graph at the end of the second step is chosen to be the obvious one) . □ 

Proof of Theorem [3l Part (ii) of Theorem [3] follows from Part (ii) of The- 
orem [21 since a graph with an isolated vertex cannot be connected. So it 
suffices to prove Part (i) of the theorem. Moreover by Lemma [4] we may 
assume for the remainder of the proof that 

< 4. (7) 
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Given a graph G{n,m,2), we define the corresponding colour graph 
H{n, m, 2) as follows. The vertices of H{n, m, 2) are the set M of colours. 
Two distinct vertices x and y of H{n, m, 2) are connected by an edge if and 
only if some vertex v in G{n, m, 2) is assigned the set {x, y} of colours (in 
other words, if there exists v G G{n,m,2) such that = {x,y}). Thus 
H{n, m, 2) has m vertices and at most n edges. 

We claim that the colour graph H{n, m, 2) asymptotically almost surely 
contains at least n — (logn)^ edges. To prove the claim we define for any 
two distinct vertices u,v G G{n,m, k), a random variable by 

_ / ifF„/F„ 
"'''"I 1 ifF« = F., 

and let X = ^ Xu,v , where the sum is over all pairs of distinct vertices in 
G(n, m, 2). Now E{Xu,v) = (™) ^, and so ([7]) and linearity of expectation 
imply that 

Markov's inequality now implies that 

Pr {X > (logn)^) < 2(logn)V(logn)^ = 2(logn)^^ ^ 0, 

and so asymptotically almost surely there are at most (logn)^ pairs u,v of 
vertices such that F„ = F^. When H{n,m,2) has n — i edges, there must 
be at least i pairs u,v £ G{n, m, 2) with Fu = F^. So the claim follows. 

We say a graph is near connected if it consists of a connected com- 
ponent together with a (possibly empty) set of isolated vertices. Note 
that G{n, m, 2) is connected if and only if the corresponding colour graph 
H{n, m, 2) is near connected. We may regard the edges of H{n, m, 2) as 
being obtained by sampling n times with replacement from the set of edges 
of the complete graph on m vertices (with the uniform distribution) . By our 
claim asymptotically almost surely H{n,m,2) contains at least n — (logn)^ 
edges, and so we stop the process after we have sampled this number of dis- 
tinct edges to obtain a subgraph H' of H{n,m,2). Note that H' is chosen 
uniformly from the set of all graphs on m vertices with n — [(log n)^J edges. 
Since the property of being near connected is monotone, Theorem [3] will 
follow if we can show that H' is almost surely near connected. A random 
graph with m vertices and x edges is near connected whenever 

X > — (logm + loglogm + w'), (8) 
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where w' — > oo as m — > oo (see Bollobas [3l Page 164]). Now, 

log m < log 4 + log n — log log n 

since An/m > \ogn by dS]). Since m < n whenever m is sufficiently large, 
we find that 

log n > log m + log log n — log 4 > log m + log log m — log 4. 

If we set a; = n — [(logn)^J we see that 

Ax ^ An 4(logn)'^ 
m ~ m m 

= logn + w - o(l) by ([5]) and ([Tj) 

> logm + log log m + uj + 0(1). 

Thus ^ > logm+loglogm+w' where a;' — > oo as m — > oo, and therefore ([8]) 
holds. So -ff' is near connected, and the theorem follows. □ 

4 The case when m > n 

Theorem 5. Let k and m be functions of n such that m > n. 

(i) Suppose that 

liminf , > 1- (9) 

n— ►oo m, log n 

Then asymptotically almost surely G{n, m, k) is connected. 

(ii) Suppose that 

, k'^n 
limsup — < 1. 

n^oo m log n 

Then asymptotically almost surely G{n, m, k) is not connected. 

Note that this theorem implies Theorem [1] holds in the case when a > 1, 
and so our proof of Theorem[T]is complete once we have proved this theorem. 
As before, Part (ii) of Theorem [5] follows from Part (ii) of Theorem [2l since 
a graph with an isolated vertex cannot be connected. So it suffices to prove 
Part (i) of the theorem. Our proof of Part (i) parallels and tightens the 
work of Di Pietro et al [A\ . 

If G(n, m, k) is not connected, it has a component S of size at most n/2. 
Lemmas m [7] and [8] together show that the probability of such a component 



11 



S existing tends to as n ^ oo, and so the theorem will follow from these 
three lemmas. 

Note that ([9]) and the fact that m > n together imply that k > \/Togn 
for all sufficiently large n. In particular, — > oo as n ^ cxd. 

Lemma 6. Under the conditions of Part (i) of Theorem [5], G{n, m, k) 
asymptotically almost surely contains no components of size s, with s < 

Proof. We claim that it suffices to prove the lemma under the additional 
assumption that 

k' < l!^. (10) 
n 

For suppose we have proved the lemma under this additional assumption. 
Given any k satisfying ([9]), define k' by 



k' 



k if k < (4mlogn)/n, 



[y^ (4m logn)/nJ otherwise. 



Since 2 < k' < k, we may construct an instance of G{n, m, k) by first 
assigning k' colours to each vertex to obtain an instance of G{n,m,k'), 
and then assigning an additional k — k' colours to each vertex to obtain an 
instance of G{n, m, k). Assigning the additional k — k' colours can only add 
edges to the graph, so the probability that G(n, m, k) has no component of 
order at most en®/^ is bounded below by the corresponding probability for 
G{n, m, k'). Since lim inf ( A;')^n/m log n > 1, the probability that G{n, m, k') 
has no component of order at most en®/^ tends to 1, by the lemma under 
the additional assumption (fTUj) . So our claim follows. 

For a set S of vertices of size s, let As be the event that 5 is a component 
of G{n, m, k). Choose a constant < e < 1 such that 

(l-2e)^^>l (11) 
mlogn 

for all sufficiently large n. Such a constant exists by ([9]). Let Bs be the 
event that fewer than (1 — £)ks colours are assigned to S. Note that 

Pt{As) = FiiBs)FT{As I Bs)+Ft(B^)'Pt{As \ B^) 
<FiiBs)+PT{As\B^). 

First, we shall give an upper bound on Pr(i?5'). There are choices 
for a set of [(1 — e)fcsj colours; each of the s vertices in S is assigned a subset 
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of these colours with probabihty (^^^ fc^^^^)/(T)- ^° 



m 

L(l - e)ks\ 



k ) 



< 



em 



{l-e)ks 

eks 



m 



ks 



ks 



ks 



m 



By pop and since s < rfi^^ and m > n, we have 



ks 4mlognn°'^ _i r 

— < \\ < 2n 9 V log n. 

ra \ n m 

Since ^ oo as n — > oo we have ek ^ oo and thus for sufficiently large n 

ek 



^^/^ks 



m 



< n 



-2s 



If Bs does not occur, we may find a subset K of colours of size [(1 — 
that have been assigned to 5. For S" to be a component, each of the n — s 
vertices not in S must be assigned colours that are disjoint from K, and so 



Vt{As I Bs) < 



'\m-{l-£)ks\^ 
\ k J 



< 



:i - e)ks 



m 



k{n—s) 



< exp I — ( 1 — e 



s{n — s) k'^n 



n 



m 



„ l-e n-s 

< n 1-2= " by (fTT 



for sufficiently large n. 

The event that G{n, m, k) has a component of size at most ere^/^ is 
bounded above by the following expression, where we sum over all subsets 
S of vertices of size at most en^^^: 

Fi{As) < ( Pr(B5) + Pr(A5 I B^)) 



s=l 

oo 

s=l 



n + n 



{l+e)s 



-(l+e)s 

2 



1 
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which tends to zero as n tends to infinity. 



□ 



Lemma 7. Under the conditions of Part (i) of Theorem\El G(n,m,k) asymp- 
totically almost surely contains no components of size s, where en^l^ < s < 
min{m/A;, n/2}. 

Proof. Just as in the proof of Lemma El we may assume in addition that 
the inequahty holds. 

For a subset S of vertices of size s, define C5 to be the event that S is 
assigned at most ^ks colours. We proceed as in Lemma O with the event 
Cs replacing the event Bs- So the probability that G{n,m,k) contains a 
component of the size we are interested in is bounded above by 

^Pr(C5)+Pr(^5 1^), 

s 

where we are summing over all subsets of vertices of size s, where en^/^ < 
s < min{m/A;, n/2}. We wish to prove that this sum tends to as n — > 00. 
We begin by showing that 

^Pr(C5)^0 

s 

as n ^ 00. A similar argument to that in the proof of Lemma [6] shows that 



But then 




We may write the summand in this last expression in the form (x^)*, where 
X = eks/Am and t = 2m/ e. Since has no internal maxima (just a single 
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minimum at x = e our summand is maximized at the extremes of its 
range. So our summand is bounded above by fi where 

^ = '^''''\[-^) 'UJ 'UJ 

by (jlOp and since k ^ oo. Thus 

^Pr(C5)<((n/2)+lV = o(l), 
s 

as required. 

The event ^5 requires that the colours assigned to the n — s elements of 
V \ S are disjoint from the colours assigned to 5 (for otherwise there would 
be edges between V \ S and S), and so if Cs does not occur we see that 



Hence 




< ^ni'n"^" (by Q) 

s 

00 1 1 

< ^n~72* = — J 

s=i n72 - 1 

which tends to as n tends to infinity. □ 

Lemma 8. Under the conditions of Part (i) of Theorem [5l G(n, m, k) 
asymptotically almost surely contains no components of size s, where m/k < 
s < n/2. 
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Proof. We need to show that the probabihty that G{n, m, k) has a com- 
ponent of size s > m/k, where s < n/2, tends to 0. If m/k > n/2 this 
probabihty is 0, so we assume that m/k < n/2. 

Let T be a set of vertices of size [m/A;]. Let Dt be the event that 
there are at least n/2 vertices in F \ T having no edges to T. Note that if 
G{n,m,k) contains a component S of size s where m/k < s < n/2, all the 
events Dt where T C S* occur (since V \ S has size at least n/2, and the 
vertices in F \ 5 have no edges to S and so in particular have no edges to 
r). So the probability that G{n, m, k) contains a component of size s where 
m/k < s < n/2 is bounded above by YIt P'^(^t)) where the sum is over all 
subsets T QV with |T| = \m/k'\. Let Ct be the event that T is assigned 
m/4 colours or fewer. We have that 

PrpT) = Pr(CT) Pr(L>r I Ct) + Pr(^) Pr(£»T | C^) 
< Pr (Ct) + Pr(L»T | C^), 

and so it suffices to show that Y^T^^iCr) and J^T^^i^T \ Ct) both tend 
to as n — GO. Now, 



Pr(CT) < 



m 



4m/4j\ \ rW':! 
y k ) \ 



< 



[m/4\J \ (™) 

\ m/4 / /A 

me \ I m/4: 



m/Aj \ m 
= (4e)"'/^4-™. 

As — ^ GO, we may assume that k > 4 when n is sufficiently large. So 

< (4e)"'/24-'". 

Since -\/4e < 4, we see that this sum tends to as n — oo. 

Let T be fixed, and let v & V \ T. Let be the event that there are no 
edges from v to T. Then Pr(£^^) is equal to the probability that the colours 
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assigned to v are disjoint from the colours assigned to T. Thus 



< (4/5)^' < (4/5)^'^. 

Note that the events are independent. The event Dt occurs exactly 
when n/2 or more of the events occur. So, writing p = (4/5)^'°^", we 
find 

Pv{Dt I CV) < Pr (Bin(n - [m/fc],Pr(^^ | Ct)) > n/2) 

< Pr (Bin(n,p) > n/2) 

< exp (n(i log 2p + i log(2(l - p)))) 

by the Chernoff bound (see Bollobas [3l Page 11]) 

< exp ^— inY^lognlog(5/4) + 0(n)^ . 

Thus 

^Pt{Dt I C^) < 2" exp (^-in0ognlog(5/4) + 0(n)) 

T 

= exp (^-in0ognlog(5/4) + 0(n)^ 

as n ^ oo. So the lemma follows. □ 

We comment that our approach subtly differs from Di Pietro et al [31 [5] , 
in the following way. Let B be the event that there exists a set S of the 
vertices of G{n,m,k) with l^l < mm{m/k,n/2} which is assigned |5|/c/4 
or fewer distinct colours. Di Pietro et al show that this event occurs with 
negligible probability, and then perform the rest of their analysis on the ran- 
dom graph obtained from G{n, m, k) under the assumption that B does not 
occur. The colours assigned to different vertices given that B does not occur 
are no longer independent, but Di Pietro et al seem to assume independence 
in their estimates. Our approach avoids this problem by considering the in- 
dividual events Bs for a fixed subset S of vertices (see the proof of Lemma [6] 
for example) . The event Bs only depends on the colours assigned to vertices 
in S, so colours assigned to vertices not in S are still chosen independently 
when we assume that Bs does not occur. 
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5 Discussion 



We conjecture that it is possible to prove a sharper threshold for uniform 
random intersection graphs. Indeed, we believe that the following conjecture 
is true. 

Conjecture. Let k and m he functions of n. 

(i) Suppose that 

= (log n) + u 

m 

where cj — > oo as n ^ oo. Then almost surely G{n, m, k) is connected. 

(ii) Suppose that 

k'^n 

= (log n) — u 

m 

where to ^ oo as n ^ oo. Then almost surely G{n,m,k) is not 
connected. 

The results in this paper show that Part (ii) of the conjecture holds (see 
Theorem [2] in Section [2] above). Moreover the full conjecture holds in the 
special case when k = 2 (by Theorem [3] in Section [3|). To prove the full 
conjecture, a natural approach would be to determine the correct gener- 
alisation to hypergraphs of the threshold dH) for the near connectivity of 
graphs. This might allow a proof along the lines of Section [3l However, as 
far as the authors are aware, no sufficiently strong results for hypergraphs 
are currently known: it would be interesting to see whether such results 
could be established. 

Let Pconn{n,m, k) be the probability that G{n,m,k) is connected. It is 
easy to show that the function pconn{n,m, k) is non-decreasing in k. We 
proved a special case of this fact in our comments below the statement 
of Theorem [3l and essentially the same proof works in general. It seems 
reasonable to believe that pconnin',m,k) is a non- increasing function of m 
(so the probability that G{n,m + l,k) is connected is no larger than the 
probability that G(n, m, k) is connected) but we are not able to find a proof 
of this. Can a proof be found? 
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